Real-time Topic Detection with Bursty N-grams
نویسندگان
چکیده
Twitter is becoming an ever more popular platform for discovering and sharing information about current events, both personal and global. The scale and diversity of messages makes the discovery and analysis of breaking news very challenging. Nonetheless, journalists and other news consumers are increasingly relying on tools to help them make sense of Twitter. Here, we describe a fully-automated system capable of detecting trends related to breaking news in real-time. It identifies words or phrases that ‘burst’ with sudden increased frequencies, and groups these into topics. It identifies a diverse set of recent tweets that are related to these topics, and uses these to create a suitable human-readable headline. In addition, images coming from the diverse tweets are also added to the topic. Our system was evaluated using 24 hours of tweets as part of the Social News On the Web (SNOW) 2014 data challenge.
منابع مشابه
CLEar: A Real-time Online Observatory for Bursty and Viral Events
We describe our demonstration of CLEar (Clairaudient Ear), a real-time online platform for detecting, monitoring, summarizing, contextualizing and visualizing bursty and viral events, those triggering a sudden surge of public interest and going viral on micro-blogging platforms. This task is challenging for existing methods as they either use complicated topic models to analyze topics in a off-...
متن کاملBursty event detection from microblog: a distributed and incremental approach
As a new form of social media, microblogs (e.g., Twitter and Weibo) are playing an important role in people’s daily life. With the rise in popularity and size of microblogs, there is a need for distributed approaches that can detect bursty event with low latency from the short-text data stream. In this paper, we propose a distributed and incremental temporal topic model for microblogs called Bu...
متن کاملNever Abandon Minorities: Exhaustive Extraction of Bursty Phrases on Microblogs Using Set Cover Problem
We propose a language-independent datadriven method to exhaustively extract bursty phrases of arbitrary forms (e.g., phrases other than simple noun phrases) from microblogs. The burst (i.e., the rapid increase of the occurrence) of a phrase causes the burst of overlapping Ngrams including incomplete ones. In other words, bursty incomplete N-grams inevitably overlap bursty phrases. Thus, the pro...
متن کاملReal Time Event Detection in Twitter
Event detection has been an important task for a long time. When it comes to Twitter, new problems are presented. Twitter data is a huge temporal data flow with much noise and various kinds of topics. Traditional sophisticated methods with a high computational complexity aren’t designed to handle such data flow efficiently. In this paper, we propose a mixture Gaussian model for bursty word extr...
متن کاملIdentifying Evolutionary Topic Temporal Patterns Based on Bursty Phrase Clustering
We discuss a temporal text mining task on finding evolutionary patterns of topics from a collection of article revisions. To reveal the evolution of topics, we propose a novel method for finding key phrases that are bursty and significant in terms of revision histories. Then we show a time series clustering method to group phrases that have similar burst histories, where additions and deletions...
متن کامل